-
Notifications
You must be signed in to change notification settings - Fork 53
Extended Sessions for Isolated (Orchestrations) #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
sophiatev
merged 46 commits into
main
from
stevosyan/extended-sessions-for-orchestrations-isolated
Sep 17, 2025
Merged
Extended Sessions for Isolated (Orchestrations) #449
sophiatev
merged 46 commits into
main
from
stevosyan/extended-sessions-for-orchestrations-isolated
Sep 17, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…c facing method without the extended sessions parameter
This was referenced Jul 14, 2025
andystaples
requested changes
Aug 1, 2025
Collaborator
andystaples
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments
bachuv
reviewed
Aug 1, 2025
cgillum
reviewed
Sep 11, 2025
Member
cgillum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things I think we should address in this PR before merging.
cgillum
approved these changes
Sep 15, 2025
…perties are not specified, also added another test to make sure that extended sessions aren't stored if isExtendedSessions is false
andystaples
approved these changes
Sep 17, 2025
cgillum
approved these changes
Sep 17, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces enabling extended sessions for orchestrations in the .NET isolated framework. The way this is achieved is via an
IMemoryCachewhich maintains the extended sessions state in memory. A cache entry is evicted if it has not been accessed for the user-specifiedextendedSessionIdleTimeoutInSeconds, which is communicated to the dotnet SDK via the properties field of theOrchestratorRequest. The properties field is also used to communicate:ExtendedSession: whether or not the orchestration request is within an extended session (in which case the worker will attempt to get the extended session state from the cache, and if it cannot find it, it will add it to the cache),IncludePastEvents: whether or not past orchestration events were included in the orchestration request.The latter is necessary to detect the edge case scenario where a worker has since evicted an extended session from its cache but the host assumes that the worker still has the state in memory and does not include an orchestration history with the request (this should be very rare but can happen if e.g. there is a network delay when sending the orchestration request to the worker, or something along those lines). If there is no orchestration history (this is it the first execution of the orchestration), then everything is fine - the worker will play the new events and create a cache entry for the extended session (
IncludePastEventswill be true, but there is simply nothing to include). If there is an orchestration history but it was not included with the request (IncludePastEventsis false), then the worker will not attempt to execute the request since it lacks the history it needs to replay the orchestration up to the execution. It will set the newrequiresHistoryfield in theOrchestratorResponseto true, in which case the host will end the extended session via aSessionAbortedException, and so the next time the orchestration request is sent a history will be included.Other PRs:
Open questions
Currently the way we attempt to extract the properties from the
OrchestratorRequest.Propertiesfield is via string literals. This is obviously unideal - ideally we would define constants somewhere for these property keys instead. The problem is that the way WebJobs actually creates thisPropertiesfield is to use the field names of theRemoteOrchestratorConfigurationclass, and obviously we do not want to import this class into the SDK and use to it figure out what these string literals should be.Any ideas how to remedy this? Should we make the
OrchestratorRequest.Propertiesfield another way?Design Callout
There are several avenues by which the worker can inform the host that it needs an orchestration history in the case that the worker has ended the extended session before the host. The worker will evict an extended session after the user-specified
extendedSessionIdleTimeoutInSecondsexpires. More specifically, what this means is that if the extended session is not accessed within that timeframe, the worker will evict it from the cache. In the meantime, it has to send the result of the orchestration work item back to the host, the host has to process this and commit it to storage, then wait for new orchestration messages, then send the worker the next work item once new messages arrive. From the host's perspective, theextendedSessionIdleTimeoutInSecondsapplies only to the amount of time we wait for new orchestration messages to arrive before ending the extended session. All that to say, there could perhaps be an appreciable number of situations where the worker ends the extended session before the host and will require a history.The current approach is perhaps the simplest - have the worker inform the host via the
requiresHistoryflag that it needs a history, in which case the host throws aSessionAbortedExceptionfromOutOfProcMiddleware. DT.Core already has all the logic to handle aSessionAbortedExceptionand retry the work item. The cons of this approach are that theSessionAbortedExceptionsurfaces to the logs and may alarm customers. There is also an added delay that comes from having DT.Core abort the orchestration work item entirely and try it again later in this approach. All that being said, perhaps the number of situations where this occurs could be reduced by just advising customers to up theirextendedSessionIdleTimeoutInSecondscount in the isolated model.An alternative is to have the worker call StreamInstanceHistory in the case that it has since ended the extended session and needs a history. This will not surface an exception to the customer, and will lead to less of a delay in processing the work item. But it could be much more complicated to implement as I am not sure how to get a client over to the
GrpcOrchestrationRunner. to accomplish this, and may require additional edge-case handling for all the network issues that could arise.Manual testing done thus far
SessionAbortedException. The work item will then be retried again.Performance testing
Two scenarios were run in in-process with extended sessions enabled/disabled, and in isolated with extended sessions enabled/disabled. Multiple trials were run, and the time it took to complete the orchestration was recorded as well as the number of times the history was loaded in each of these settings (with
extendedSessionIdleTimeoutInSecondsset to 30 seconds). The two scenarios tested wereSessionAbortedExceptionbeing thrown (so the worker ends the extended session before the host, and as expected, throws the exception which leads to a retry of the work item, this time with a history attached). In in-process, there was just one history load. Without ES, there were about 100 history loads in each.